Recognition of Phonemes in A-cappella Recordings using Temporal Patterns and Mel Frequency Cepstral Coefficients

نویسنده

  • Jens Kofod Hansen
چکیده

In this paper, a new method for recognizing phonemes in singing is proposed. Recognizing phonemes in singing is a task that has not yet matured to a standardized method, in comparison to regular speech recognition. The standard methods for regular speech recognition have already been evaluated on vocal records, but their performances are lower compared to regular speech. In this paper, two alternative classification methods dealing with this issue are proposed. One uses Mel-Frequency Cepstral Coefficient features, while another uses Temporal Patterns. They are combined to create a new type of classifier which produces a better performance than the two separate classifiers. The classifications are done with US English songs. The preliminary result is a phoneme recall rate of 48.01% in average of all audio frames within a song.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pronunciation recognition of English phonemes /\textipa{@}/, /æ/, /\textipa{A}: / and /\textipa{2}/ using Formants and Mel Frequency Cepstral Coefficients

The Vocal Joystick Vowel Corpus, by Washington University, was used to study monophthongs pronounced by native English speakers. The objective of this study was to quantitatively measure the extent at which speech recognition methods can distinguish between similar sounding vowels. In particular, the phonemes /@/, /æ/, /A:/ and /2/ were analysed. 748 sound files from the corpus were used and su...

متن کامل

Voice-based Age and Gender Recognition using Training Generative Sparse Model

Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...

متن کامل

Measuring Acoustic Reduction in Feature Space

Modelling varying speaking style remains a challenge to state of the art speech recognition and synthesis systems. Vowel and consonant reduction have been identified as correlative to speaking style variation, but still lack a common measurement. The reduction phenomena are often observed without consideration of coarticulation and assimilation effects, and as a result of speaking rate variabil...

متن کامل

A Comparative Study Of LPCC And MFCC Features For The Recognition Of Assamese Phonemes

In this paper two popular feature extraction techniques Linear Predictive Cepstral Coefficients (LPCC) and Mel Frequency Cepstral Coefficients (MFCC) have been investigated and their performances have been evaluated for the recognition of Assamese phonemes. A multilayer perceptron based baseline phoneme recognizer has been built and all the experiments have been carried out using that recognize...

متن کامل

The Capacity of Mel Frequency Cepstral Coefficients for Speech Recognition

Speech recognition is of an important contribution in promoting new technologies in human computer interaction. Today, there is a growing need to employ speech technology in daily life and business activities. However, speech recognition is a challenging task that requires different stages before obtaining the desired output. Among automatic speech recognition (ASR) components is the feature ex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012